101 research outputs found
LCNN: Lookup-based Convolutional Neural Network
Porting state of the art deep learning algorithms to resource constrained
compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose
a fast, compact, and accurate model for convolutional neural networks that
enables efficient learning and inference. We introduce LCNN, a lookup-based
convolutional neural network that encodes convolutions by few lookups to a
dictionary that is trained to cover the space of weights in CNNs. Training LCNN
involves jointly learning a dictionary and a small set of linear combinations.
The size of the dictionary naturally traces a spectrum of trade-offs between
efficiency and accuracy. Our experimental results on ImageNet challenge show
that LCNN can offer 3.2x speedup while achieving 55.1% top-1 accuracy using
AlexNet architecture. Our fastest LCNN offers 37.6x speed up over AlexNet while
maintaining 44.3% top-1 accuracy. LCNN not only offers dramatic speed ups at
inference, but it also enables efficient training. In this paper, we show the
benefits of LCNN in few-shot learning and few-iteration learning, two crucial
aspects of on-device training of deep learning models.Comment: CVPR 1
Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images
In this paper, we study the challenging problem of predicting the dynamics of
objects in static images. Given a query object in an image, our goal is to
provide a physical understanding of the object in terms of the forces acting
upon it and its long term motion as response to those forces. Direct and
explicit estimation of the forces and the motion of objects from a single image
is extremely challenging. We define intermediate physical abstractions called
Newtonian scenarios and introduce Newtonian Neural Network () that learns
to map a single image to a state in a Newtonian scenario. Our experimental
evaluations show that our method can reliably predict dynamics of a query
object from a single image. In addition, our approach can provide physical
reasoning that supports the predicted dynamics in terms of velocity and force
vectors. To spur research in this direction we compiled Visual Newtonian
Dynamics (VIND) dataset that includes 6806 videos aligned with Newtonian
scenarios represented using game engines, and 4516 still images with their
ground truth dynamics
ELASTIC: Improving CNNs with Dynamic Scaling Policies
Scale variation has been a challenge from traditional to modern approaches in
computer vision. Most solutions to scale issues have a similar theme: a set of
intuitive and manually designed policies that are generic and fixed (e.g. SIFT
or feature pyramid). We argue that the scaling policy should be learned from
data. In this paper, we introduce ELASTIC, a simple, efficient and yet very
effective approach to learn a dynamic scale policy from data. We formulate the
scaling policy as a non-linear function inside the network's structure that (a)
is learned from data, (b) is instance specific, (c) does not add extra
computation, and (d) can be applied on any network architecture. We applied
ELASTIC to several state-of-the-art network architectures and showed consistent
improvement without extra (sometimes even lower) computation on ImageNet
classification, MSCOCO multi-label classification, and PASCAL VOC semantic
segmentation. Our results show major improvement for images with scale
challenges. Our code is available here: https://github.com/allenai/elasticComment: CVPR 2019 oral, code available https://github.com/allenai/elasti
RICH AND EFFICIENT VISUAL DATA REPRESENTATION
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation
Scalable Object-Class Search via Sparse Retrieval Models and Approximate Ranking
In this paper we address the problem of object-class retrieval in large image data sets: given a small set of training examples defining a visual category, the objective is to efficiently retrieve images of the same class from a large database. We propose two contrasting retrieval schemes achieving good accuracy and high efficiency. The first exploits sparse classification models expressed as linear combinations of a small number of features. These sparse models can be efficiently evaluated using inverted file indexing. Furthermore, we introduce a novel ranking procedure that provides a significant speedup over inverted file indexing when the goal is restricted to finding the top-k (i.e., the k highest ranked) images in the data set. We contrast these sparse retrieval models with a second scheme based on approximate ranking using vector quantization. Experimental results show that our algorithms for object-class retrieval can search a 10 million database in just a couple of seconds and produce categorization accuracy comparable to the best known class-recognition systems
- …